Stochastic Unsupervised Learning on Unlabeled Data

نویسندگان

  • Chuanren Liu
  • Jianjun Xie
  • Yong Ge
  • Hui Xiong
چکیده

In this paper, we introduce a stochastic unsupervised learning method that was used in the 2011 Unsupervised and Transfer Learning (UTL) challenge. This method is developed to preprocess the data that will be used in the subsequent classification problems. Specifically, it performs K-means clustering on principal components instead of raw data to remove the impact of noisy/irrelevant/less-relevant features and improve the robustness of the results. To alleviate the overfitting problem, we also utilize a stochastic process to combine multiple clustering assignments on each data point. Finally, promising results were observed on all the test data sets. Indeed, this proposed method won us the second place in the overall performance of the challenge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Learning from partially labeled data

The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but they cannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabeled data distribution, but do not exploit the labels. Thus, this task falls between traditional s...

متن کامل

Semi-Supervised Learning with Sparse Autoencoders in Automatic Speech Recognition

This work is aimed at exploring semi-supervised learning techniques to improve the performance of Automatic Speech Recognition systems. Semi-supervised learning takes advantage of unlabeled data in order to improve the quality of the representations extracted from the data. The proposed model is a neural network where the weighs are updated by minimizing the weighted sum of a supervised and an ...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012